Data description
In this post, we wrangle our hospital data a bit and then describe it, focusing on:
characteristics of exiting hospitals and
elements of our data that may warrant additional consideration in standard ML models.
We’ve added another five years to our data: we now observe the population of U.S. hospitals (~6,300) from 2003-2017.
Our predictor variables are drawn from five categories:
hospital characteristics: such as number of beds, admissions, total and technological services; non-profit & teaching status; Medicaid share; market share; age
market: population, poverty rate, number of hospitals, and each hospitals’ distance from the nearest 100 bed hospital
ownership structure: whether the hospital is owned by a system or vertically integrated
financial: revenue, growth, profit margins, uncompensated care costs, cash, debt
lagged variables: outcome, ownership, and financial history
Our data is already tidy, but we spend some time cleaning up the financial variables. HCRIS data are extracted from unaudited financial records submitted to the Centers for Medicare & Medicaid Services (CMS). They’re the most complete source of provider financial data but noisy. Prior literature typically winsorizes outliers and unreasonable values. We also create a net debt and financial growth variables.
#winsorize outliers outside the 1st and 99th percentiles
var_vec = c('ptnt_mgn', 'ni_mgn', 'liquid', 'uncomp', 'uncomp_mgn', 'capex', 'capex_mgn', 'rev_adm', 'tot_assets', 'ptnt_opex', 'oth_costs', 'tot_costs','levg')
clean_p1_p99 <- cleaned %>% group_by(year) %>% summarise_at(vars('ptnt_mgn', 'ni_mgn', 'liquid', 'uncomp', 'uncomp_mgn', 'capex', 'capex_mgn', 'rev_adm', 'tot_assets', 'ptnt_opex', 'oth_costs', 'tot_costs','levg'), .funs=list(p1=~quantile(.,.01,na.rm=TRUE),p99=~quantile(.,.99,na.rm=TRUE)))
clean_p1_p99_merged <- inner_join(cleaned,clean_p1_p99,by='year')
for(varname in var_vec) {
cleaned$rmflag = (cleaned[{varname}] < clean_p1_p99_merged[glue('{varname}_p1')] |
cleaned[{varname}] > clean_p1_p99_merged[glue('{varname}_p99')] |
is.na(cleaned[{varname}]))
}
cleaned <- cleaned %>% filter(!rmflag)
#create a net debt variable
cleaned <- mutate(hcris, netdebt = debt - cash)
#create growth variables
cleaned <- cleaned %>% group_by(num_prvdr_num) %>% mutate(across(c(rev_tot,rev_netptnt,admtot,net_income,ptnt_income,exptot), list(ch=~(.x-lag(.x))/.x)))
Our key outcome variable is a hospital’s market participation decision in each year. Mapping hospital exits– by closure, acquisition, or conversion– suggests that they occur in rural and urban areas and are not concentrated within any one region.
#here's how we made the above gif
#plot exiting hospitals & rural counties
p <- plot_usmap(data = rural, values = "code", color = "lightskyblue1", size = .001) +
geom_point(data = closures, aes(x = long.1, y = lat.1), color="gray0", shape = 16, size=1) +
scale_fill_continuous(low = "deepskyblue3", high = "white", name = "Rural", label = scales::comma) +
#labs(title = "U.S. hospital closures, 2003-2017") +
theme(legend.position = "right")
anim <- p + transition_states(year, transition_length = 0, state_length = 10) +
enter_fade() +
exit_fade() +
ggtitle("U.S. hospital closures, 2003-2017",subtitle='{closest_state}')
animate(anim, duration = 20, fps=5, renderer = magick_renderer())
anim_save("closure_map.gif", anim = last_animation())
The rate of hospital exits slowed after 2005– from ~4% to ~1% of hospitals exiting each year. Closure is the most common form of exit; conversion and being acquired and then closed (“absorbed”) are even rarer. About 4% of hospitals are bought each year throughout the period. It is somewhat surprising that neither the Great Recession nor the Affordable Care Act appear to have affected hospitals’ average investment, closure, and entry patterns.
How do hospitals that exit differ from ones that don’t?
Exiting hospitals are located in markets with higher rates of poverty, uninsured, and non-Medicaid expansion states. They are smaller and provide less complex care; they are more likely to be for-profit or public but not system owned. Exiting hospitals are equally likely to be located in rural and urban areas; however, very few small and rural (“Critical Access”) hospitals close. On average, exiting hospitals are within 15 miles of a 100 bed hospital, compared to 17 miles for non-exiting hospitals. They have lower occupancy rates, profit margins, debt and cash levels, and capital expenditures.
| pre_out_ex | 0 | 1 |
| num_prvdr_num | 264499 | 274205 |
| tot_pop | 699935 | 765226 |
| white | 0.780 | 0.754 |
| highschool | 0.311 | 0.312 |
| college | 0.159 | 0.147 |
| unempl | 0.0737 | 0.0758 |
| med_inc | 55048 | 52373 |
| uninsur | 0.129 | 0.148 |
| public_insur | 0.329 | 0.329 |
| private_insur | 0.662 | 0.637 |
| elderly | 0.146 | 0.141 |
| poverty | 0.151 | 0.159 |
| male | 0.495 | 0.494 |
| own | 2.01 | 2.17 |
| mcaid_exp | 0.137 | 0.060 |
| wage_index | 0.986 | 0.960 |
| tacmi | 1.47 | 1.23 |
| bought | 0.0320 | 0.0449 |
| sysowned | 0.588 | 0.542 |
| bought_is | 0.0153 | 0.0233 |
| bdtot | 161 | 93 |
| admtot | 6573 | 2681 |
| ipdtot | 39229 | 21088 |
| paytot | 52527291 | 19050288 |
| exptot | 1.32e+08 | 4.60e+07 |
| fte | 891 | 364 |
| teach | 0.0548 | 0.0112 |
| catholic | 0.1099 | 0.0881 |
| cah | 0.234 | 0.052 |
| minorteach | 0.231 | 0.139 |
| rural | 0.363 | 0.367 |
| mcare | 0.480 | 0.487 |
| mcaid | 0.164 | 0.149 |
| vi | 0.452 | 0.322 |
| tot_services | 45.8 | 27.7 |
| tech_services | 0.169 | 0.136 |
| mh_services | 0.0669 | 0.0613 |
| bought_ss | 0.0167 | 0.0216 |
| hsa_sh | 0.575 | 0.451 |
| hosp_hsa | 5.74 | 7.36 |
| hrr_sh | 0.0567 | 0.0220 |
| hosp_hrr | 34.7 | 37.1 |
| dist2hosp | 17.3 | 15.4 |
| ch_tot_serv | 0.03038 | 0.00647 |
| occ | 0.573 | 0.519 |
| age_hosp | 27 | 23 |
| age_sys | 7.94 | 6.55 |
| exit | 0 | 0 |
| enter | 0.0202 | 0.0400 |
| switch | 0.0110 | 0.0184 |
| switch2np | 0.00765 | 0.01041 |
| invest | -0.0134 | 0.0753 |
| cont | 0.98 | 0.96 |
| lat | 38.0 | 36.8 |
| long | -92.5 | -91.9 |
| uncomp | 22.7 | 25.5 |
| dsh | 0.489 | 0.500 |
| dsh_pct | 0.152 | 0.143 |
| dsh_adj | 3.72 | 1.53 |
| ptnt_opex | 157.6 | 51.9 |
| oth_costs | 5.17 | 2.76 |
| tot_costs | 159.5 | 52.8 |
| rev_tot | 170 | 52 |
| rev_netptnt | 156.6 | 47.8 |
| net_income | 7.22 | -1.49 |
| ptnt_income | -2.26 | -3.91 |
| cash | 18.65 | 2.82 |
| debt | 24.67 | 7.61 |
| tot_assets | 191.3 | 46.9 |
| fa_tot | 77.2 | 19.8 |
| capex | 10.29 | 3.23 |
| liquid | 28.14 | 2.72 |
| rev_adm | 0.0410 | 0.0471 |
| levg | -10.60 | -7.11 |
| ptnt_mgn | -364 | -146 |
| ni_mgn | -364 | -146 |
| uncomp_mgn | 0.104 | 0.175 |
| capex_mgn | 33.3317 | 0.0825 |
| outcome_ex | 0 | 0 |
| has_exit | 0 | 1 |
| netdebt | 6.02 | 4.79 |
| rev_tot_ch | 0.0128 | -0.0818 |
| rev_netptnt_ch | 0.00953 | -0.09110 |
| admtot_ch | -0.0816 | -1.4111 |
| net_income_ch | 62.8 | -29.4 |
| ptnt_income_ch | NaN | -0.0655 |
| exptot_ch | -0.00655 | -0.05082 |
| n | 77142 | 1249 |
Note that in our data, some non-profit hospitals do not exit even after many years of negative profits.
Panel data: We’re looking into ways to accommodate panel data in ML models, e.g., Generalized linear mixed-model (GLMM) trees.
Variable collinearity: Many of our financial variables are highly correlated (blue in the corrgram below), and most variables are autocorrelated. We hope to lean on CART (and other ML algorithms) for variable selection, which should be less problematic than it is for linear models.
Missing years of data: In our data, missing data often occurs during financial distress, or before a hospital exits a market.